learning discrete distribution
Learning discrete distributions: user vs item-level privacy
Much of the literature on differential privacy focuses on item-level privacy, where loosely speaking, the goal is to provide privacy per item or training example. However, recently many practical applications such as federated learning require preserving privacy for all items of a single user, which is much harder to achieve. Therefore understanding the theoretical limit of user-level privacy becomes crucial. We study the fundamental problem of learning discrete distributions over $k$ symbols with user-level differential privacy. If each user has $m$ samples, we show that straightforward applications of Laplace or Gaussian mechanisms require the number of users to be $\mathcal{O}(k/(m\alpha^2) + k/\epsilon\alpha)$ to achieve an $\ell_1$ distance of $\alpha$ between the true and estimated distributions, with the privacy-induced penalty $k/\epsilon\alpha$ independent of the number of samples per user $m$. Moreover, we show that any mechanism that only operates on the final aggregate should require a user complexity of the same order. We then propose a mechanism such that the number of users scales as $\tilde{\mathcal{O}}(k/(m\alpha^2) + k/\sqrt{m}\epsilon\alpha)$ and further show that it is nearly-optimal under certain regimes. Thus the privacy penalty is $\tilde{\Theta}(\sqrt{m})$ times smaller compared to the standard mechanisms. We also propose general techniques for obtaining lower bounds on restricted differentially private estimators and a lower bound on the total variation between binomial distributions, both of which might be of independent interest.
Learning discrete distributions: user vs item-level privacy
Much of the literature on differential privacy focuses on item-level privacy, where loosely speaking, the goal is to provide privacy per item or training example. However, recently many practical applications such as federated learning require preserving privacy for all items of a single user, which is much harder to achieve. Therefore understanding the theoretical limit of user-level privacy becomes crucial. We study the fundamental problem of learning discrete distributions over k symbols with user-level differential privacy. If each user has m samples, we show that straightforward applications of Laplace or Gaussian mechanisms require the number of users to be \mathcal{O}(k/(m\alpha 2) k/\epsilon\alpha) to achieve an \ell_1 distance of \alpha between the true and estimated distributions, with the privacy-induced penalty k/\epsilon\alpha independent of the number of samples per user m .
Review for NeurIPS paper: Learning discrete distributions: user vs item-level privacy
This paper examines user-level privacy in the context of learning discrete distributions. Near matching upper and lower bounds (with a corresponding mechanism) on the number of users m required for a desired level of total variation distance are established, while it is shown that natural baselines the Laplace and Gaussian mechanism achieve inferior performance by a factor of sqrt(m). User-level privacy is a variation on pure/approximate differential privacy in which a mechanism's response distribution must be indistinguishable not only to change of an individual item (record) but those items belonging to a user. The paper considers the cases of users contributing equal and unequal numbers of items. R3 highlights an important concern with the user-privacy definition required for the proposed mechanism: that the users' items are drawn i.i.d.
Learning discrete distributions: user vs item-level privacy
Much of the literature on differential privacy focuses on item-level privacy, where loosely speaking, the goal is to provide privacy per item or training example. However, recently many practical applications such as federated learning require preserving privacy for all items of a single user, which is much harder to achieve. Therefore understanding the theoretical limit of user-level privacy becomes crucial. We study the fundamental problem of learning discrete distributions over k symbols with user-level differential privacy. If each user has m samples, we show that straightforward applications of Laplace or Gaussian mechanisms require the number of users to be \mathcal{O}(k/(m\alpha 2) k/\epsilon\alpha) to achieve an \ell_1 distance of \alpha between the true and estimated distributions, with the privacy-induced penalty k/\epsilon\alpha independent of the number of samples per user m .
Learning Discrete Distributions by Dequantization
Hoogeboom, Emiel, Cohen, Taco S., Tomczak, Jakub M.
Media is generally stored digitally and is therefore discrete. Many successful deep distribution models in deep learning learn a density, i.e., the distribution of a continuous random variable. Na\"ive optimization on discrete data leads to arbitrarily high likelihoods, and instead, it has become standard practice to add noise to datapoints. In this paper, we present a general framework for dequantization that captures existing methods as a special case. We derive two new dequantization objectives: importance-weighted (iw) dequantization and R\'enyi dequantization. In addition, we introduce autoregressive dequantization (ARD) for more flexible dequantization distributions. Empirically we find that iw and R\'enyi dequantization considerably improve performance for uniform dequantization distributions. ARD achieves a negative log-likelihood of 3.06 bits per dimension on CIFAR10, which to the best of our knowledge is state-of-the-art among distribution models that do not require autoregressive inverses for sampling.